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PREFACE 


Tills  study  is  a  part  of  Rand's  continuing  effort  to  support  weapon 
systems  analyses  and  performance  predictions  with  detailed  understand¬ 
ing  of  all  aspects  of  a  problem.  For  example,  many  operations  (both 
military  and  nonrailitary)  depend  critically  on  a  human  observer's  abil¬ 
ity  to  search  for  and  find  a  desired  object  or  "target"  amid  background 
clutter  within  a  limited  time.  Equations  are  developed  in  this  Memo¬ 
randum  which  permit  the  calculation  of  recognition  probabilities  as  a 
function  of  the  observed  or  displayed  target  contrast  and  size  (angular 
subtense) ,  the  number  of  resolution  cells  across  the  minimum  dimension 
of  a  target,  the  required  search  area  and  available  search  time,  the 
false-target  density  or  some  other  measure  of  scene  congestion,  and  the 
signal-to-noise  ratio. 

Thr  results  should  be  helpful  to  both  designers  and  users  of  all 
systems  in  which  visual  observation  plays  a  significant  role.  In  ad¬ 
dition,  the  model  can  be  used  to  formulate  realistic  display  require¬ 
ments  for  those  systems  in  which  a  sensor  is  Interposed  between  the 
observer  and  the  real  world. 


SUMMARY 


Thi#  Memorandum  presents  a  model  for  describing  analytically  the 
capabilities  end  limitations  of  a  human  observer  in  the  task  of  looking 
for  and  finding  known  or  expected  fixed  objects.  The  description  takes 
the  form  of  six  algebraic  equations  which  together  enable  the  user  to 
estimate  recognition  probabilities  as  a  function  of  the  many  parameters 
required  to  describe  a  specific  situation.  The  model  is  tailored  to 
the  case  of  an  airborne  observer  looking  at  terrain  with  cr  without  op¬ 
tical  aids  or  electro-optical  sensors*  but  with  prior  knowledge  of  the 
approximate  appearance  of  an  object.  In  Air  Force  applications,  it  es¬ 
timates  the  probability  that  a  pilot  or  observer  will  be  able  to  say, 
"There  is  the  target!" 

The  model  is  structured  according  to  three  distinguishable  psycho¬ 
physical  processes:  deliberate  search  over  a  fairly  well-defined  area, 
detection  of  contrasts  (a  subconscious  retino-neural  process),  and  rec¬ 
ognition  of  shapes  outlined  by  the  contrast  contours  (a  conscious  de¬ 
cision  based  on  comparison  with  memory) .  In  addition,  when  the  ob¬ 
server  is  vitving  a  displayed  image  of  a  scene,  noise  is  usually  pres¬ 
ent  which  degrades  his  performance  of  these  three  steps.  The  probability 
that  the  three  steps  are  completed  successfully,  multiplied  by  a  noise 
degradation  factor,  gives  the  probability  of  target  recognition. 

A  search  term  expresses  the  probability  of  looking  in  the  right 
direction  for  the  target  as  a  function  of  the  desired  search  rate  (with 
the  area  normalized  to  the  target  area)  and  a  measure  of  scene  conges¬ 
tion  or  false-target  density.  A  contrast  term  expresses  the  probability 
of  spot  detection  as  a  function  of  the  ratio  of  actual  to  threshold 
contrast.  The  latter  is  determined  by  the  angular  subtense  of  the  tar¬ 
get  or  its  Image  at  the  eye.  A  resolution  or  shape-recognition  term 
expresses  the  probability  of  recognition  as  a  function  of  the  number 
of  resolution  cells — be  they  equipment-limited  or  set  by  the  observer's 
eye — contained  within  the  shortest  dimension  of  the  target.  A  final 
terra  gives  the  degradation  in  recognition  probability  caused  by  image 
noise,  expressed  as  a  function  of  the  signal-to-noise  ratio. 
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In  a  narrow  sense,  the  only  values  that  need  to  be  supplied  by 
the  user  of  this  model  are  the  apparent  size  and  contrast  of  the  target 
as  seen  by  the  observer,  the  desired  search  rate,  and  the  congestion 
of  Che  scene  (defined  as  the  average  n’u&ber  of  fixation  points  or  false 
targets  in  an  area  100  times  the  area  of  the  target).  In  practice, 
particularly  when  artificial  (e.g.,  electro-optical)  sensors  are  used, 
additional  information  must  be  given  about  the  displayed  contrast,  scale, 
resolution,  and  noise. 

In  view  of  the  paucity  and  inconsistency  of  available  experimental 
evidence,  the  accuracy  of  most  inputs  to  the  model  (i.e.,  contrast, 
number  of  resolution  cells,  etc.)  is  expected  to  be  no  better  than 
20  to  30  percent;  estimates  of  the  congestion  factor,  another  input, 
may  well  be  in  error  by  a  factor  of  two  or  so  in  either  direction. 

Hence  the  real  utility  of  the  model  is  in  setting  bounds  to  what  should 
be  expected  of  observers  in  real  situations. 

When  applied  reiteratively  to  successive  designs  in  a  systems 
context,  the  model  serves  to  define — albeit  loosely  at  present — the  re¬ 
quirements  that  a  human  observer  places  on  any  system  which  he  must 
operate. 
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SYMBOiS 


A  »  glimpse  aperture 
8 

A  «  area  to  be  searched 
s 

aT  »  area  of  target 

C  »  observed  contrast  (apparent  or  disulayed) 

Cq  »  intrinsic  (zero-range)  contrast 
C^,  «  threshold  contrast 
f  *  spatial  frequency 

G  »  congestion  factor,  number  of  fixation  centers  per  100  aT 
K  ■  a  constant,  set  equal  to  2.3 

k  *  number  of  target  areas  (aT)  in  an  average  glimpse  aperture  (A^) , 

1-e-  V*i 

*  nominal  value  of  k,  set  equal  to  100 

M  -  average  number  of  fixation  centers  per  target  area 

»  number  of  resolution  cells  in  shortest  target  dimension 

-  probability  that  an  observer  looks  in  the  direction  of  the  target 
with  his  foveal  vision  (see  p.  3) 

■  probability  that  a  target  viewed  foveally  for  one  glimpse  period 
is  detected  (see  p.  3) 

P^  “  probability  that  a  detected  target  is  recognized  (see  p.  3) 

PR  *  probability  of  target  recognition 
t  ■  time 

u  »  dummy  integration  variable 

a  «  angular  subtense  of  target  or  image  at  the  eye 

n  “  overall  degradation  factor  arising  from  noise  in  the  image  viewed 
by  an  observer 


I.  INTRODUCTION 


In  many  operations,  success  depends  ot>  a  human  observer's  finding 
quickly  a  certain  object  in  a  scene  or  in  some  image  of  that  scene. 

In  Air  Force  operations*  armed  reconnaissance  and  many  kinds  of  strike 
missions  depend  critically  on  the  timely  identification  of  a  target  (or 
its  imag«-)  by  an  airborne  observer.  Whether  he  is  observing  directly 
with  his  unaided  vision,  using  optical  aids,  or  viewing  the  display 
produced  by  an  intervening  sensor  (e.g.,  television,  radar,  or  any  other 
Imaging  transducer),  the  same  capabilities  for  visual  search,  discrim¬ 
ination,  and  recognition  are  involved.  No  matter  how  complex  or  so¬ 
phisticated  the  sensor  in  front  of  him  or  the  computer  and  other  mech¬ 
anisms  behind  him  (e.g.,  for  measuring  coordinates  or  rates  or  for 
aiming  weapons),  the  most  crucial — and  least  understood — step  in  the 
whole  operation  is  his  conscious  decision  that  "There  is  the  target!" 

The  purpose  of  this  Memorandum  is  to  propose  a  model  that  describes 
analytically  the  performance  of  a  human  observer  in  such  a  task  as  a 
function  of  a  number  of  well-defined  and  measurable  parameters. 

While  no  subjective  act  can  be  analyzed  completely,  the  kind  of 
situation  described  in  the  above  paragraph  is  one  in  which  *:he  usually 
cited  sources  of  variability  and  unpredictability  in  human  behavior 
are  minimized.  By  contrast,  the  task  of  monitoring  an  empty  scene  or 
display — waiting  for  something  to  happen — would  be  extremely  difficult 
to  model  because  the  observer  is  so  quickly  subject  to  boredom  and  to 
"wandering"  of  an  otherwise  unoccupied  mind.  But  the  present  case  re¬ 
quires  active  search  in  a  structured  field  for  a  known  (or  briefed) 
specific  object,  or  perhaps  for  any  of  a  class  of  familiar  objects, 
such  as  trucks  cn  a  road.  In  either  case,  the  task  is  carried  out  for 
a  fairly  short  period  of  time  under  conditions  of  vary  strong  motiva¬ 
tion.  Under  such  circumstances,  the  variability  in  individual  perfor¬ 
mance  and  the  difficulty  in  specifying  that  performance  may  well  be 
less  than  the  variability  between  scenes  and  the  difficulty  in  quan¬ 
titatively  describing  the  content  and  the  degree  of  congestion  In  typ¬ 
ical  pieces  of  terrain.  In  this  Memorandum  formulas  are  proposed  which. 
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when  the  inputs  available  to  the  observer  are  completely  known,  permit 
an  estimation  of  the  probability  of  his  recognizing  a  target  as  a  func¬ 
tion  of  these  inputs. 

The  proposed  analytical  expressions,  which  constitute  the  model 
of  the  observer  developed  in  this  Memorandum,  can  provide  valuable  as¬ 
sistance  not  only  to  designers  of  display  equipment,  but  also  to  de¬ 
signers,  purchasers,  and  users  of  complete  systems.  However,  the  limi¬ 
tations  of  this  model  should  also  be  recognized.  First,  it  does  not 
attempt  to  provide  a  mechenistie  analogue  of  an  observer.  All  that  is 
required  of  it — and  all  it  provides — is  an  estimation  of  recognition 
probabilities.  Second,  as  has  been  indicated,  the  user  of  the  model 
must  provide  estimates  of  the  pertinent  observable  properties  of  a 
scene,  or  its  displayed  image,  as  follows:  In  direct  viewing,  the 
size  and  apparent  contrast  of  the  target,  the  required  search  rate, 
and  the  congestion  of  the  scene  are  all  that  are  needed;  when  an  inter¬ 
mediate  display  is  used,  the  displayed  target  size  (scale)  and  con¬ 
trast,  system  resolution,  and  signal-to-noise  ratio  (S/N)  must  also 
be  given.  The  model  describes  only  the  observer,  but  by  so  doing  pro¬ 
vides  an  essential  portion  of  the  analysis  that  must  be  employed  in 
evaluating  any  manned  system. 

It  should  not  be  inferred  from  the  foregoing  statements  that  this 
is  the  first  or  only  such  model.  Many  existing  models  have  been  uti¬ 
lized  in  formulating  the  present  one.  It  differs  from  others,  however, 
in  its  conceptual  approach  at  some  important  points,  and  it  reflects 
a  conscious  effort  to  structure  the  model  according  to  distinguishable 
psychophysical  processes.  It  is  these  conceptual  differences,  includ¬ 
ing  the  selection  of  pertinent  variables,  which  may  justify  the  presen¬ 
tation  of  yet  another  model  of  the  human  observer. 
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II.  THE  MODEL 


The  performance  of  a  human  observer  is  often  a  very  complicated 
function  of  many  interacting  variables.  In  order  to  simplify  this  dif¬ 
ficult  situation  and  yet  stay  reasonably  close  to  reality,  we  consider 
explicitly  the  task  described  in  the  introduction:  the  finding  of 
known  and  fixed  objects  in  a  complex  field  in  a  short  time.  This  pro¬ 
cess,  even  when  so  restricted,  is  still  complex,  but  it  can  be  consid¬ 
ered  to  consist  of  the  following  three  distinct  steps:  deliberate 
search  over  a  fairly  well-defined  area,  detection  of  contrasts  (a  sub¬ 
conscious  retino-neural  process),  and  recognition  of  shapes  outlined 
by  the  contrast  contours  (a  conscious  decision  based  on  comparison  with 
memory) .  In  addition,  when  the  observer  is  viewing  a  displayed  image 
of  a  scene,  noise  is  usually  present  which  degrades  his  performance  of 
all  three  of  these  steps. 

On  the  basis  of  assorted  experimental  data,  four  formulas  can  be 
devised:  three  for  the  probabilities  of  completing  each  of  the  three 
steps  separately,  and  one  for  a  noise  degradation  factor.  It  is  pos¬ 
tulated  that  the  overall  target  recognition  probability  can  be  expressed 
by  the  product  of  these  four  terms.  Accordingly,  we  establish  the  fol¬ 
lowing  definitions: 

1.  P^  is  the  probability  that  an  observer,  searching  an  area 
that  is  known  to  contain  a  target,  looks  for  a  specified  glimpse  time 
(viz.,  1/3  sec)  in  the  direction  of  the  target  with  his  foveal  vision. 

P^  is  a  function  of  the  ratio  of  an  acceptable  search  rate  to  that  de¬ 
manded  in  a  given  situation;  the  loosely  defined  concept  of  foveal  vi¬ 
sion  is  replaced  by  that  of  an  effective  glimpse  aperture. 

2.  P^  is  the  probability  that  if  a  target  is  viewed  foveally  for 
one  glimpse  period  it  will,  in  the  absence  of  noise,  be  detec -ed.  I 
is  determined  by  psychophysical  limits  operating  on  the  observed  or 
displayed  target  size  and  contrast. 

3.  is  the  probability  that  if  a  target  Is  detected  it  will  be 
recognized  (again  during  a  single  glimpse  and  in  the  absence  of  noise) . 
Recognition  is  usually  (but  not  necessarily)  accomplished  on  the  basis 
of  intrinsic  shape  without  raliance  on  context. 
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4.  n  is  an  overall  degradation  factor  arising  from  any  noise  in 
the  image  that  io  viewed  by  the  observer. 

We  then  write  for  P^,  the  probability  cf  target  recognition, 

PR  -  P1  *  P2  *  Pj  M  (1) 

Inasmuch  as  (1)  the  first  three  steps  described  above  are  indepen¬ 
dent  events,  (2)  P ^  and  P^  (as  defined)  each  represent  a  conditional 
probability  under  the  one  preceding  it,  and  (3)  n  is  an  overall  degra¬ 
dation  factor,  the  product  formulation  of  Eq.  (1)  is  obvious  and  rig¬ 
orously  correct.  This  is  so  despite  the  fact  that  the  individual  terms 
are  not  strictly  independent  in  the  sense  that  they  may  be  functions 
of  some  of  the  same  variables  (contrast  and  S/N,  for  example).  This 
and  certain  other  subtle  Interactions  are  discussed  briefly  in  Sec¬ 
tion  III  of  this  Memorandum.  In  the  following  subsections  the  nature 
of  each  of  the  four  terms  is  examined  in  some  detail,  and  a  specific 
analytical  expression  is  developed  for  each  one. 

THE  SEARCH  TERM 

The  first  term,  P^,  describes  the  search  limitations;  the  primary 
concern  ia  structured  search.  By  contrast,  in  free  search,  large  ob¬ 
jects  (such  as  clearings  in  woods)  or  objects  with  outstanding  con¬ 
trast  are  usually  spotted  first  by  peripheral  vision  and  are  then  ex¬ 
amined  more  carefully.  In  such  cases,  a  "visual  lobe"  theory^  of 
detection  is  appropriate  in  which  successive  looks  in  random  directions 
are  postulated  and  off-axis  detections  are  significant.  Indeed,  such 
a  model v  1  was  used  effectively  in  the  analysis  of  some  classified  vls- 
ual  reconnaissance  tests  in  which  most  of  the  targets  were  highly 

* 

visible  once  found.  That  is  not  the  kind  of  situation  treated  here, 
nor  are  moving  targets  to  be  considered.  Motion  cues  are  recognized  to 
be  quite  important — ip  fact,  often  overriding — but  are  not  included. 

^Visual  lobe  theory  was  developed  for  completely  unstructured 
search,  such  as  horizon  search  at  sea  or  search  of  the  sky  in  daylight; 
its  application  to  search  of  terrain,  even  under  the  conditions  men¬ 
tioned,  is  therefore  somewhat  suspect. 
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When  searching  from  the  air  for  a  terrestrial  object  whose  loca¬ 
tion  is  known  only  approximately,  it  becomes  both  possible  and  neces¬ 
sary  to  utilize  foveal  vision  and  to  search  fairly  systematically. 

The  maximum  acuity  of  foveal  vision  is  a  necessity,  since  there  is  al¬ 
ways  a  need  to  find  targets  at  the  earliest  possible  moment  during 
approach,  and  at  long  range  either  the  apparent  size  or  the  available 
contrast  or  both  may  be  marginal;  few  military  targets  really  stand 
out.  Foveal  vision  is  also  usually  feasible,  since  only  a  limited  area 
needs  to  be  covered.  The  required  area  may  be  as  much  as  the  whole  of 
fin  electro— optical  display,  but  more  commonly  it  is  an  area  set  by  navi¬ 
gation  errors  and  target  location  uncertainties,  centered  on  a  pre¬ 
dicted  or  expected  target  location.  Even  under  these  conditions,  how¬ 
ever,  search  rates  are  extremely  variable  and  almost  intractable  for 
the  fundamental  reason  that  pieces  of  terrain  (not  to  mention  possible 
targets)  differ  widely  and  almost  defy  quantification.  Nevertheless, 
some  bounds  can  be  set. 

It  is  well  known  that  the  eye  moves  in  discrete  steps,  ordinarily 
with  about  tnree  stops,  called  fixations,  per  second. ^  (Actually  an 
observer  occasionally  takes  longer  te  examine  certain  points,  but  this 
does  not  affect  very  much  the  average  search  rates  described  below.) 

Our  approach,  therefore,  is  to  postulate  that  an  experienced  observer 
searches  by  moving  an  apparent  aperture  (essencielly  his  foveal  vision) 
in  some  fairly  regular  pattern  over  the  area  of  interest,  and  further¬ 
more  that  he  adjusts  his  average  interfixation  distance,  and  hence  the 
effective  size  of  his  scanning  aperture  and  his  overall  search  rate, 
in  accordance  with  his  a  priori  information  on  the  size  and  contrast 
of  the  target  or  its  image.  Intuitively,  one  recognizes  that  an  ob¬ 
server  will  scan  the  floor  arsund  him  differently  if  he  is  looking  for 
a  pencil  or  an  ant.  Stated  more  formally,  the  observer  estimates  how 
far  off  his  visual  axis  he  will  still  have  an  adequate  probability  of 
detecting  the  expected  image,  and  he  automatically  adjusts  his  search 
rate  accordingly.  A  key  concept,  therefore.  Is  the  size  of  the  effec¬ 
tive  scanning  aperture — here  called  a  glimpse  aperture,  A^.  This  is 
a  quantity  that  commonly  ranges  from  10  to  100  times  the  area  of  the 
target,  aT,  but  can  sometimes  vary  between  1  and  1000  times  aT« 


The  reason  for  this  huge  spread  is  not  just  the  observer's  inabil¬ 
ity  to  predict  the  nature  of  the  image  or  his  own  detection  probabili¬ 
ties.  It  lies  in  a  second  important  factor — the  structure,  complexity, 
or  “congestion"  of  the  surrounding  scene.  The  search  for  an  ant  men¬ 
tioned  above  will  also  be  quite  different  depending  on  whether  the 
floor  is  covered  with  a  nearly  featureless  linoleum  or  a  textured  and 
patterned  rug.  However,  this  “congestion"  cannot  be  described  solely 
by  the  two-dimensional  spatial-frequency  content  in  a  scene.  What 
really  matters  is  the  density  of  contrast  points — the  natural  fixation 
centers  for  the  eye — or  other  "confusion  objects"  that  are  present  in 
the  scene.  The  writer  once  experienced  a  striking  example  of  many  such 
false  targets  (natural  decoys,  as  it  were)  while  flying  over  the  noto¬ 
rious  Coso  Range  in  California.  This  region  contains  scattared  traes 
and  bushes  which  appear  very  dark  against  the  background  of  sandy  soil 
or  dried  grass,  as  do  the  vehicles  and  "bridges"  which  were  plaeod  in 
the  area  as  "targets."  Almost  every  tree  had  to  be  examined  to  see 
whether  or  net  it  had  straight  sides  befora  the  true  targets  could  be 
found.  Indeed,  tests  there  have  produced  soma  of  the  lowest  target 
acquisition  probabilities  ever  measured/5^ 

The  kind  of  adaptive  search  rate  described  here,  in  which  the  ob¬ 
server  automatically  reacts  to  both  the  character  of  tha  aeans  and  tha 
(anticipated)  nature  of  the  target  imbedded  in  that  a cane,  has  baen 
advocated  informally  by  this  writer  for  several  years.  Tha  only  inde¬ 
pendent  reference  to  such  a  concept  found  In  the  literature  is  by 
Williams. ^  He  talks  about  target  "conspieuityr"  which  la  measured 
by  the  rate  at  which  a  particular  target  can  be  suecess felly  searched 
for  in  a  particular  field,  and  he  points  out  that  tha  commonly  observed 
lack  of  dependence  of  target  acquisition  on  display  scale  factor  (within 
limits,  sad  assuming  no  change  in  information  content  an  the  display) 
is  another  manifestation  of  observer  adaptation.  Other  experimenters, 
of  whom  Richardson*'  '  is  an  important  example,  recognise  the  strong  de¬ 
pendence  of  search  performance  on  "target  class." 

A  heuristic  derivation  of  an  expression  for  follows,  along  with 
an  indication  ef  the  supporting  experimental  evidence.  If  an  area  A# 
is  to  be  searched,  the  number  ef  glimpses  (each  of  area  A^  «  ka^) 
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required  to  cover  the  area  is  A^/A^.  The  number  of  glimpses  that  are 
available  in  t  sec,  at  1/3  sec  per  glimpse,  is  3t.  With  perfectly 
systematic  search,  the  probability  of  "looking  at"  the  target  (i.e., 
including  it  within  a  glimpse  aperture)  would  be  just  the  ratio  of  the 
available  glimpses  to  the  total  number  required,  or  3t/(As/A^).  This 
would  give  P^  the  form  of  a  linear  ramp  function  with  time.  Real  search 
is  probably  something  between  perfectly  systematic  and  purely  random,  so 
that  P^  should  have  a  form  that  lies  between  the  ramp  and  an  exponential 
rise.  We  conservatively  adopt  the  latter  and  postulate 


P 


1 


1 


-K  x  3t/ (A  /ka_) 
s  T 
e 


where  K  is  a  constant  and  k  is  a  parameter  related  to  scene  congestion. 

The  exponential  form  proposed  for  the  dependence  of  P.  on  t  was 
(6)  A/a' 

predicted  by  Williams '  '  and  was  found  by  Boynton  and  Bush.  '  The  de¬ 
pendence  on  kt  (or  t/M)  found,  chough  quoted  somewhat  differently,  by 
Boynton  et  al.^  and  by  Nygaard,  Slocum  et  al.,^10^  and  still  differ¬ 
ently  by  Stathacopoulos  et  al.,^11^  can  be  closely  approximated  by  the 
identical  exponential  function.  The  evaluation  of  the  coefficient  K 
is  accomplished  as  follows.  The  previous  equation  can  be  interpreted 
in  terms  of  search  rates  as  well  as  total  numbers  of  glimpses.  In  that 
case  the  exponent  is  merely  K  times  the  ratio  of  an  "acceptable"  or 
successful  search  rate,  ka^,  per  1/3  sec,  to  the  required  rate  Ag/t. 

If  "acceptable"  is  defined  as  yielding  a  value  of  0.9  for  P^,  then 
k  must  be  selected  from  measured  data  for  which  «  0.9  and  at  the 

same  time  K  must  be  set  so  that,  when  the  real  rate  is  equal  to  this 

-K 

acceptable  rate,  P^-l-e  »*0.9.  Therefore  K  *  loge  10  -  2.3. 

(If  some  other  definition  were  adopted  for  "acceptable,"  K  and  k  would 


Alternatively  and  by  completely  parallel  reasoning,  in  a  scene 
(like  the  Coso  Range  mentioned  above)  in  which  the  average  density  of 
confusing  objects  or  fixation  centers  is  M  per  target  area,  the  number 
of  glimpses  required  to  cover  the  area  A  is  given  by  MA  /a„.  This 

S  8  1 

leads  to  an  identical  expression  for  P^  if  M  is  set  equal  to  1/k.  The 
effect  of  extra  fixation  points  is  therefore  to  increase  the  number  of 
glimpses  per  unit  area  (or  to  decrease  the  average  interfixation  jump 
distance)  and  hence  to  reduce  the  effective  glimpse  aperture  and  the 
areal  search  rate  that  can  be  achieved. 
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change  reciprocally,  maintaining  a  constant  product.)  The  search  rates 

(8  9) 

measured  in  Boynton's  experiments, v  '  *  when  normalized  to  the  number 

* 

of  target  areas  per  glimpse  time  at  a  0.9  probability  of  success,  yield 

'12) 

a  value  of  k  200.  Simon's  data,'1  '  on  the  ocher  hand,  dealing  with 

real  imagery  of  very  congested  scenes  (e>g.,  metropolitan  Los  Angeles) 

yield  a  value  in  the  neighborhood  of  k  *  10.  As  might  be  expected, 

this  kind  of  spread  in  observed  search  rates  is  not  uncommon.  Bennett's 
(13) 

data  lead  to  an  average  value  of  135  for  k,  while  this  writer  in  an 
old  unpublished  experiment  found  k  *  40.  Since  we  think  that  Boynton's 
artificial  scenes  may  be  unrealistically  low  in  clutter,  we  conclude, 
from  the  foregoing  and  a  wide  range  of  similar  data,  that  values  of  k 
for  real  scenes  typically  fall  between  10  and  100,  but  that  values  well 
outside  that  range  are  also  possible.  For  convenience,  we  write  kQ/G 
for  k,  where  k^  is  a  nominal  value  of  k  for  which  we  adopt  the  figure 
100,  and  G  is  a  "congestion  factor"  equal  to  unity  in  the  nominal  case 
but  taking  on  various  positive  values,  usually  between  1  and  10,  for 
other  scenes.  Accordingly,  we  propose  the  following  expression  for  P^: 

~t (700/G) (a^/A  ) t] 

P1  -  1  -  «  (2) 

Since  by  definition  G  *  kQ/k  *  100/k  (•=  100M),  it  can  be  visualized 
as  the  average  number  of  fixation  centers  per  nominal  glimpse  aperture 
of  100  a^,  and  this  Indeed  constitutes  a  valid  physical  definition  for 
G.  In  practice,  however,  it  may  be  little  more  than  a  measure  of  rela¬ 
tive  congestion.  Values  of  G  less  than  1  are  possible,  as  has  already 
been  implied,  but  these  should  be  invoked  by  the  user  spari-  }ly  and 
only  for  relatively  open  scenes — those  naturally  containing  regions  of 
uniform  brightness  (e.g.,  lakes  or  empty  fields)  that  can  be  jumped 
over  quickly,  or  artificially  so  by  virtue  of  moving-target  indicating 
(MTI)  radar  or  multispectral  cueing.  Values  greater  than  10  are  also 


The  experiments  cited  in  this  paragraph  were  all  essentially 
search-limited;  i.e.,  the  targets  were  easily  recognized  once  they  were 
actually  looked  at  (fixated  upon).  In  the  terms  of  the  present  model, 
the  conditional  probabilities  ?2  and  P3  were  high,  approaching  unity. 
Hence  "successful  search"  can  be  translated  into  "looking  in  the  direc¬ 
tion  of  the  target"  as  required  for  P^. 
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possible,  but  they  are  not  very  common  either,  except  in  the  sense 
that  the  effective  search  rate  would  be  quite  low  whenever  significant 
decision  times  are  invelved,  as  when  examining  truly  confusing  objects 
or  decoys. 

it  may  be  noted  that  the  exponent  in  Eq.  (2)  is  simply  seven  times 
the  reciprocal  of  the  rate  at  which  fixation  points  must  be  examined 
in  order  to  cover  the  search  area  in  the  time  allowed.  This  Interpre¬ 
tation  is  theoretically  sound  and  is  intriguing  in  its  simplicity. 
However,  it  is  probably  not  very  helpful  in  practice  (at  least  with 
the  present  state  of  our  knowledge)  because  of  the  difficulties  in  pre¬ 
dicting  which  points  in  a  scene  will  prove  to  be  fixation  centers.  By 
providing  the  user  with  a  nominal  glimpse  aperture  (and  search  rate) , 

Eq.  (2)  demands  of  him  only  that  he  estimate  deviations  from  that  nom¬ 
inal* — by  selecting  for  G  a  number  that,  in  most  cat  -2,  lies  between 
1  and  10. 

Speaking  realistically,  even  an  experienced  observer  who  can  judge 
the  relative  congestion  of  a  given  scene  with  respect  to  ethers  may 
have  difficulty  in  estimating  the  value  of  G  better  chan  to  within  about 
a  factor  of  two,  but  this  is  still  much  better  than  having  no  bounds 
whatsoever.  In  fact,  it  permits  one  to  draw  such  general  but  Important 
conclusions  as  these:  Broad  area  search  from  high-speed  aircraft  is 
rather  futile,  while  road  recce  or  other  one-dimensional  search  may, 
on  the  other  hand,  be  quite  feasible  up  to  speeds  of  a  few  hundred  knots. 

THE  CONTRAST  TERM 

The  second  term,  P_,  has  to  do  with  the  basic  process  of  contrast 
1  (14) 

detaction  by  the  human  visual  system.  Blackwell's  classical  ex¬ 
periments  provide  the  fundamental  data  here,  yielding  curves  of  thresh¬ 
old  contrast  (50-percent  detection  probability)  versus  size  of  circular 
discs  under  various  levels  of  ambient  illumination.  These  are  commonly 
called  "demand"  contrast  functions.  However,  there  is  a  good  deal  of 


Consider,  for  example,  linear  search  at  10  truck  lengths/glimpse, 
which  corresponds  to  600  ft/s ee  or  350  kn  permissible  speed;  however, 
by  the  same  argument,  a  two-dimensional  search  for  a  tank  over  a  swath 
width  of  as  little  as  1000  ft  would  be  limited  to  a  speed  of  70  kn. 
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evidence  that  the  best  (i.e.,  lowest)  threshold  values  obtained  by 
Blackwell  must  be  adjusted  upward  substantially  for  application  to  the 
practical  situations  discussed  in  this  Memorandum.  A  good  critique  of 
the  pertinent  experiments  on  this  subject  is  that  by  Davies. Fol¬ 
lowing  him  in  part,  we  take  an  average  of  the  data  for  exposures  of 
1/3  sec  obtained  by  Blackwell  and  McCready^'^  and  by  Taylorv^  as 
the  most  relevant  starting  point  (consistent  with  the  search  model  de¬ 
veloped  in  the  previous  section),  assume  pho topic  vision  with  30  to 
100  fL  average  scene  brightness,  and  then  apply  the  following  .torrec- 

/IJjN 

tions.  A  factor  of  2.4  in  contrast  is  suggested  by  Blackwell v  ’  for 
the  difference  between  free-choice  situations  and  the  msre  easily  con¬ 
trolled  but  less  realistic  forced-choice  experiments,  while  he  also 

suggests  a  factor  of  about  1.5  to  allow  for  uncertainties  in  position 

(19) 

or  time  of  target  appearance.  Similarly,  Vos  et  al.  found  that  an 


An  aside  on  the  effects  at  other  light  levels  may  be  of  some  in¬ 
terest  at  this  point.  Ihe  luminance  level  chosen  above  is  intended  to 
cover  ordinary  daylight  seeing  and  also  the  (pho topic)  viewing  of 
bright  electro-optical  displays.  With  more  light,  the  curve  of  Fig.  1 
on  p.  12  shifts  downward  and  to  the  left,  but  only  slightly.  As  the 
available  light  decreases,  however,  the  curve  moves  sharply  to  the 
right  and  up  by  a  factor  that  is  roughly  the  square  roct  of  the  factor 
by  which  the  luminance  changes.  This  performance  "loss"  can  be  recov¬ 
ered  by  electronic  gain,  as  In  image  intenaif iers ,  up  to  the  point  that 
the  electronic  gain  is  merely  amplifying  "empty"  photon  noise.  At  tills 
point  the  performance  is  limited,  not  by  the  eye,  but  by  the  Informa¬ 
tion  contained  in  the  arriving  photon  stream.  This  new  limit  is  some¬ 
what  different  in  shape,  being  approximately  hyperbolic  in  resolution 
and  contrast  (linear  on  Fig.  1,  with  a  slope  of  -1) ,  and  of  course  it 
depends  on  the  luminance  level  and  on  several  properties  of  the  lnten- 
sifier  hardware.  For  example,  following  Richards  (Ref.  20)  In  a  slight 
refinement  over  the  original  Rose  formula  (Ref.  21), 

3*40  a 

in  which  k  is  the  effective  S/N  (—5  (see  p.  16)),  D2  is  the  area  of 
the  collecting  aperture,  e  is  the  electronic  charge,  t  is  the  transmis¬ 
sion  of  the  optics,  S  is  the  photocathode  sensitivity  (A/lm),  t  is  the 
Integration  time,  and  B  is  the  scene  luminance  (lm/sr/unlt  area)  of 
the  brighter  of  two  patches  just  resolved  at  apparent  contrast  C. 
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overall  factor  varying  between  2.5  and  3.5  in  contrast  is  required  to 
reconcile  certain  of  Blackwell's  data  with  theirs  taken  under  somewhat 
more  realistic  conditions  (but  still  in  a  laboratory  using  uniform  back¬ 
grounds)  .  In  a  flight  environment  there  are  still  further  degradations, 
primarily  two:  the  direct  blurring  effects  of  vibration  and  the  inabil¬ 
ity  of  an  observer  to  accommodate  simultaneously  to  all  intenrity  levels 
when  viewing  a  real  scene  that  probably  contains  at  least  20  dB  of  dy¬ 
namic  range.  Davies  argues,  rather  vaguely,  that  another  60-percent 
degradation  (e  factor  of  1.6)  is  little  enough  to  allow  for  these  and 
other  effects,  end  we  agree.  It  ia  proposed,  therefore,  that  the  shape 
of  the  "demand"  curve  of  threshold  contrast  C^,  versus  angular  subtense 
a  in  minutes  of  arc  (min)  be  taken  from  the  average  of  the  two  best- 
knewn  sources  of  1/3-sec  date,  and  that  this  curve  be  adjusted  upward 
by  a  factor  of  about  5.5  in  contrast — or  that  0.75  be  added  to  leg  con¬ 
trast.  The  resulting  curve  is  plotted  as  a  dashed  line  in  Fig.  1. 

(22) 

In  addition  to  the  evidence  that  has  been  cited  by  Taylor  for 
various  "field  factors"  of  the  sort  just  described,  there  are  some  mea¬ 
ger  flight  test  data  by  Heap,^2^  reported  more  fully  by  Davies, 
the  results  of  which  are  plotted  in  Fig.  1.  It  may  also  be  observed 
that  clinical  optometrists  use  gray  scale  prints  consisting  of  20  1-dB 
steps  end  assert ^2^  that  this  is  ell  that  can  be  seen  in  a  "mixed  field." 
This  is  not  exactly  "hard  data,"  but  simply  corroborative  evidence  from 
another  field  concerning  the  ceareeness  of  contrast  discrimination  in 
practical  situations. 

Since,  in  the  absence  of  bright  lights  or  specular  glint,  target 

contrasts  greeter  then  unity  ere  rarely  observed  through  the  real  atmo- 
(25) 

sphere,  and  even  less  frequently  on  military  targets,  the  dashed 
curve  in  Fig.  1  can  be  approximated  by  the  hyperbola 

(log  +  2) (log  a  +  0.5)  -  1  (3) 


Contrast  is  defined  here  as  the  absolute  value  of  the  difference 
between  target  end  background  luminances  divided  by  the  background 
luminance. 
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which  is  shown  by  the  solid  curve  in  Fig.  1.  This  simplification  is 
often  convenient  and  usually  adequate,  but  whenever  contrasts  greater 
than  unity  are  important  (for  exampl-i,  on  certain  electro-optical  dis¬ 
plays),  a  more  accurate  curve  with  an  asymptotic  slope  of  -1/2  should 
be  used. 

It  is  obvious  from  Fig.  1  that  a  better  fit  could  be  obtained  be¬ 
tween  the  two  citrves.  Very  simply,  if  the  hyperbola  were  shifted  0.1 
to  the  right,  by  changing  the  2  to  1.9  in  Eq.  (2),  an  excellent  though 
still  not  optimum  agreement  with  the  "true"  curve  would  result.  But, 
in  accordance  with  the  old-fashioned  concept  of  significant  figures, 
one  should  not  imply  a  precision  of  results  teat  is  not  justified.  In 
view  of  the  way  the  dashed  curve  was  derived,  it  may  be  no  more  accu¬ 
rate  than  20  or  30  percent— so  it  really  should  be  drawn  with  an  air 
brush.  Accordingly,  with  the  present  state  of  our  knowledge,  nc  greater 
accuracy  should  be  inferred  for  Eq.  (2)  than  is  indicated. 

The  probability  of  detection,  at  the  threshold  contrast  is, 
by  definition,  50  percent.  The  probability  of  detection  for  other  val¬ 
ues  of  observed  contrast,  C,  has  been  shown  by  Blackwell  and  McCready^^ 
to  depend  only  on  the  ratio  C/C^  and  to  have  the  form  of  the  cumulative 
normal  distribution  with  *  0.9  for  C/CT  ■  1.5.  This  is  equivalent 
to  setting  the  value  of  the  Gaussian  standard  deviation  equal  to  0.39, 
and  it  Indicates  that  on  the  average  Blackwell's  subjects  chose  to 
operate  at  a  false-alarm  rate  of  about  1/200,  corresponding  to  an  S/N 
of  roughly  2.6:1.  Further  support  for  the  general  form  of  the  depen- 
deace,  based  on  statistical  decision  theory,  is  provided  by  Ory.  1 
Accordingly,  we  write 

a  /•!;(C/CT)-1]/0.39|  _u2/2 

P?  “  — r  /  e  du  (4a) 

/2tt  ^ 


A  useful  approximation  that  Is  more  suitable 
is  the  following: 


r 


2 


-4,2[(C/Ct) 


for  machine  computation 


(4b) 
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where  the  minus  sign  is  used  when  C  <  C  .  C  is  the  actual  contrast 

T  (25) 

available  at  the  eye  after  atmospheric  effects'  and  (when  pertinent) 
equipment  gains  and  display  settings  are  accounted  for;  is  computed 
from  Eq.  (3)  with  a  giving  the  average  angular  subtense,  in  minutes  of 
arc  at  the  eye,  of  an  object  or  its  displayed  image. 

Line-of-sight  masking  by  terrain  or  foliage  is  really  outside  the 
purview  of  this  model;  however,  since  it  has  the  effect  of  reducing 
the  observed  target  area,  it  can  be  thought  of  as  reducing  P Simi¬ 
larly,  camouflage  may  reduce  the  observable  contrast  to  some  very  low 
value  or  may  alter  the  apparent  shape  of  an  object.  The  subject  of 
shape  recognition  is  disct'jsed  next. 

THE  RESOLUTION  TERM 

The  third  term,  ?^,  has  to  do  with  the  more  subjective  act  of  de¬ 
ciding  what  particular  image  forms  represent  in  the  real  world.  But 
since  we  are  primarily  concerned  with  shape  recognition  of  known  or 
briefed  objects,  as  distinct  from  the  interpretation  of  unfamiliar  im¬ 
agery,  the  problem  can  be  reduced  to  the  visibility — or  detectability 
in  the  sense  of  the  previous  subsection — of  sufficient  geometrical  de¬ 
tail  for  shapes  to  be  compared  with  memory  and  thereby  recognized.  The 
concept  of  "sufficient"  detail  might  lead  one  into  the  morass  of  "crit¬ 
ical  details" — those  unique  features  that  permit  various  classes  of 
objects  to  be  distinguished  one  from  another.  However,  when  all  por¬ 
tions  of  an  image  are  equally  detectable  so  that  the  whole  shape  is 

(27) 

either  visible  or  not,  Johnson  has  demonstrated  the  remarkable 

* 

fact  that,  for  a  variety  of  military  objects,  a  single  parameter — 
namely  N^,  the  number  of  resolution  cells  contained  in  the  shortest 
dimension  across  a  target — is  all  that  is  required  to  describe  what 
constitutes  "sufficient"  detail  for  detection  or  for  recognition.  He 
found  values  of  between  3.3  and  4.8,  or  4.0  ±20  percent,  for  high- 

One  should  probably  add  "in  a  military  context."  The  ajfcount  of 
detail  required  to  distinguish  a  truck  from  an  oxcart  is  far  less  than 
that  required  to  discriminate  between  various  truck  models;  but  the 
simple  separation  of  objects  into  classes  is  usually  sufficient  for 
designating  targets. 
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confidence  recognition.  This  important  simplification  has  been  further 

confirmed  by  Brainerd  et  al.  '  for  several  target  shapes,  and  these 

authors  also  provide  enough  data  points  to  support  a  simple  Gaussian 

(29) 

form  for  the  dependence  of  on  the  parameter  Nf.  Oatman  has  per¬ 
formed  similar  experiments  which  are  only  slightly  more  pessimistic 
(N^  larger  by  25  percent)  than  the  first  two,  provided  that  his  numer¬ 
ical  results  are  corrected  for  the  deterioration  of  his  TV  display 
away  from  the  center  of  the  tube  face.  We  adopt  a  conservative  value, 
close  to  Oatman' s 5  and  write 


P 


3 


1 


-[(Nr/2)-l]2 


N  £  2 
r 


-  0 


N  <  2 
r 


(5) 


which  makes  P_  0.9  when  N  «  5. 

3  r 

It  is  important  to  emphasize  the  meaning  of  Nf.  As  previously  de¬ 
fined,  it  is  the  number  of  resolution  cells  contained  in  the  minimum 
dimension  (e.g.,  width  or  height)  of  the  projected  image  of  an  object 
to  be  recognized.  In  the  present  context,  "resolution  cells"  means 
independently  detectable  spots — the  subject  of  the  previous  discussion. 
Pure  resolution  (in  the  original  sense  of  separating  two  spots),  though 
related,  is  not  directly  involved  here,  nor  is  resolution  as  determined 
from  a  bar  chart  the  appropriate  measure  to  be  used  in  calculating  N^. 
The  proper  procedure  is  to  calculate  first,  from  Eq.  (3)  or  Fig.  1, 
the  size  of  the  smallest  spot  that  can  be  seen — at  the  contrast  level 
with  which  the  target  is  presented  to  the  observer.  Next,  for  reasons 
discussed  in  the  following  paragraph,  this  spot  size  should  be  cor¬ 
rected  for  a  90-percent  probability  of  detection,  rather  than  using 
the  threshold  (50-percent)  value.  Finally,  the  number  of  these  90- 
percent  detectable  spots  contained  in  the  shortest  dimension  of  the 
target  image  then  gives  the  value  of  Nr»  This  procedure  is  illus¬ 
trated  graphically  in  Fig,  2  (page  19)  and  is  described  more  fully 
starting  on  page  3  8. 

The  choice  of  90  percent  as  the  level  of  detection  probability 
to  be  used  in  determining  Che  effective  resolution  in  any  specific 
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situation  is  to  some  extent  arbitrary.  However,  if  it  were  a*  low  as, 
for  example,  50  percent,  :learly  only  half  of  the  spots  would  be  visi¬ 
ble  at  any  instant.  This  would  violate  the  condition  (stated  on  page 
14)  that  the  "whole  shape"  should  be  either  visible  or  not.  Although 
exact  results  would  depend  on  the  properties  of  any  noise  that  might 
be  present  and  on  the  ability  of  the  observer  to  integrate  out  these 
effects,  it  can  be  expected  (in  the  absence  of  detailed  experimental 
evidence)  that  the  individual  spot-detection  probabilities  should  be, 
say,  85  percent  or  greater  in  order  for  the  cited  measurements  to  apply 
and  for  the  simple  form  of  Eq.  (5)  to  be  valid.  The  variations  per¬ 
mitted  within  the  remaining  uncertainty  fall  well  within  the  overall 
accuracy  limits  claimed  for  this  model. 

The  need  for  distinguishing  carefully  between  the  various  possible 
measures  of  resolution,  as  was  done  in  the  preceding  paragraphs,  arises 
from  the  fact  that  bar-chart  resolution,  particularly  when  observed 
with  converging  bars  as  in  the  common  TV  test  patterns,  is  quite  dif¬ 
ferent  from — and  significantly  more  optimistic  than — the  resolution 
determined  from  random  spot  detection.  As  pointed  out  explicitly  by 
Resell,''  the  difference  lies  in  the  ability  of  the  human  visual 
system  to  integrate  over  a  completely  known  and  heavilv  dundant 
bar  pattern — to  accept  gaps  in  a  bar  or  even  whole  missing  bars — and 
so  to  effectively  operate  at  a  much  lower  S/N  ratio  than  is  possible 
when  almost  every  "corner"  or  other  detail  of  an  arbitrary  shape  must 
be  detected  independently  : n  order  for  the  shape  to  be  correctly  ob¬ 
served.  The  difference  seems  to  be  a  factor  of  about  4  or  5.  This 

number  can  be  derived  from  a  direct  comparison  of  the  value  of  S/N  = 

(31) 

1.2  quoted  by  Farton  and  Moody,  loosely  based  on  their  bar-chart 

(21) 

measurements,  with  the  classical  work  of  Rose  on  spot  detection; 

or  it  can  be  simply  estimated,  as  was  apparently  done  by  a  group  of 
(32)* 

RCA  engineers,  from  the  fact  that  overall  S/N  is  proportional  to 

the  square  root  of  the  area  observed  and  from  the  finding  of  Coltman 
(33) 

and  Anderson  that  the  eye  uses  efficiently  the  area  of  about  5 
bars  or  line  pairs. 
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An  interesting  confirmation  of  both  the  concept  that  has  been  de¬ 
scribed  and  the  numerical  value  that  has  been  adopted  in  Eq.  (4)  can 

(34) 

be  derived  from  the  work  of  Steedman  and  Baker.  In  their  experi¬ 

ment,  the  resolution  is  clearly  defined  by  the  cell  site  of  their  com¬ 
puter-generated  patterns,  increased  by  soma  fraction  (which,  within 
limits,  does  not  make  much  difference)  of  the  blur  circles  that  are 
artificially  added.  If  their  data  (see  their  Table  I)  are  examined 
In  detail,  it  is  observed  that  those  targets  with  the  small  angular 
subtense — which  require  the  longest  search  times  and  induce  the  most 
errors — are  also  those  that  consist  of  a  small  number  of  elements  or 

cells.  Furthermore,  at  their  well-known  "cutoff”  size  of  12  min  of 
* 

arc,  the  average  number  of  resolution  cells  (with  due  allowance 
for  the  blur  circles)  Is  between  5  and  6.  Above  this  cutoff,  they 
found  an  almest  constant  search  time  for  a  given  shape,  and  an  error 
rate  of  2  to  4  percent;  below  this  valu-?.,  they  found  a  marked  Increase 
in  both  quantities.  Correspondingly,  Eq.  (4)  predicts  a  »  0.95  for 

this  value  of  N  ,  which  drops  rapidly  to  about  0.3  for  half  that  value 

r  (27) 

of  and  to  zero  for  ■  2.  The  latter  corresponds  to  Johnson’s 

criterion  for  detection  only,  with  no  shape  recognition  per  se. 

A  special  case  is  that  of  long,  narrow  objects  which,  in  the  limit, 
reduce  to  lines.  These  are  a  great  deal  easier  to  recognize,  primar¬ 
ily  because  of  the  same  redundancy  effect  mentioned  above.  This  effect 
in  one  dimension,  combined  with  moderate  (not  threshold)  levels  cf  con¬ 
trast,  gives  rise  to  the  commonly  observed  value  of  =  0.2  for  this 
case. 

In  the  process  of  applying  the  foregoing  model  of  an  observer  to 
a  practical  situation  involving  an  artificial  electronic  or  electro- 
optical  sensor,  it  would  be  helpful  to  construct  a  diagram  similar  to 


They  actually  use  the  longest  target  dimension,  which  subtends 
12  min  of  arc  under  ideal  conditions,  and  they  suggest  that  20  min  of 
arc  might  be  a  more  practical  value.  We  interpret  the  12  min  of  arc 
as  the  subtense  across  the  minimum  distension  of  the  target  under  real¬ 
istic  conditions,  which  is  probably  reasonable  for  most  commonly  shape d 
objects  for  which  the  "aspect  ratio"  is  lass  than  2:1. 
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Fig.  2.  A  description  of  this  diagram  will  serve  as  a  good  summary  of 
the  model  as  it  has  been  described  up  to  this  point.  First,  one  must 
calculate  the  displayed  contrast,  C,  for  a  hypothetical  target  with 
intrinsic  (zero-range)  contrast,  C  ,  with  respect  to  its  contiguous 
background.  This  calculation  involves  power  levels,  receiver  or  de¬ 
tector  sensitivity,  atmospheric  attenuation  and  path  luminance  effects 
(if  any),  the  transfer  characteristic  of  the  system  for  the  particular 
"gain"  and  "contrast"  settings  chosen  by  the  operator,  and  the  modula¬ 
tion  transfer  function  (MTF),  i.e.,  the  system  response  as  a  function 
of  spatial  frequency  or  reciprocal  target  size.  The  result  is  an  over¬ 
all  transfer  function  plotted  on  a  graph  of  contrast  versus  target 
image  subtense  at  the  eye,  a.  Typical  curves  for  various  possible 

measured  (or  postulated)  values  of  C  are  plotted  as  thin  solid  lines 

*  ° 

in  Fig.  2.  Next,  one  computes  the  actual  target  (image)  average  sub¬ 
tense  at  the  eye,  say  o/ ,  and  enters  Fig.  2  at  this  abscissa.  Reading 
the  appropriate  contrast  curve,  one  finds  the  value,  V  ,  with  which 
that  target  will  be  presented  to  the  observer.  C/T,  the  threshold 
contrast  for  an  object  of  apparent  size  a*  ,  is  obtained  from  Eq.  (3), 
and  the  ratio  C//C/T  permits  calculation  of  P ^  through  Eq.  (4a)  or  (4b). 
Equation  (3)  can  also  be  plotted  on  Fig.  2  for  all  values  of  a;  this 
demand  contrast  is  shown  as  the  heavy  solid  curve.  The  stippled  area 
covers  the  band  of  0.5  j£  (C/C^)  jc  1.5,  which,  by  Eq.  (4a),  represents 
the  region  for  which  0.1  £  Pj  sr  0.9.  This  can  be  used  for  finding 
in  the  following  manner.  If  the  appropriate  dlsplayed-contrast  curve 
is  followed  to  its  intersection  with  the  stippled  area,  the  abscissa 
of  that  intersection  (aay,  a#)  will  represent  the  useful  resolution 
that  can  be  achieved  on  the  subject  display  (with  targets  of  inherent 
contrast  Cq) .  The  ratio  o' /a*  (corrected,  if  necessary,  for  target  as¬ 
pect  ratio)  is  Ny,  the  parameter  which,  when  inserted  in  Eq,  (5)*  yields 
the  value  of  P^. 

It  was  implied  at  the  beginning  of  this  section  that  recognition 
In  unfamiliar  situations  may  be  much  more  complicated,  and  far  more 

- 

For  unaided  vision,  only  the  atmospheric  reduction  in  contrast 
need  bs  computed,  and  the  left-hand  intercepts  determined  accordingly; 
the  transfer  "functions"  will  then  be  horizontal  straight  lines  on  Fig. 

2  out  to  the  point  where  shimmer  sets  in. 


Fig.  2 — Schematic  representation  of  displayed  and  "demand"  contrast 

versus  displayed  target  size 


'difficult  to  predict,  than  the  mere  detection  of  shape  details.  An 
extreme  example  might  be  the  classical  one  of  the  photo-interpreters 
searching  for  completely  unknown  elements  of  the  Peenemunde  launching 
areas  during  World  War  II.  No  attempt  is  made  to  extend  this  model  to 
cover  such  cases.  It  should  also  be  mentioned,  however,  that  under 
certain  other  circumstances  recognition  may  be  very  much  easier  than 
this  model  would  predict.  Consider  the  approach  of  unauthorized  air¬ 
craft,  or  the  presence  of  vehicles  along  a  road  in  enemy  territory. 

Both  are  cases  in  which  the  mere  detection  of  objects  might  be  suffi¬ 
cient  to  justify  the  decision,  "There  is  a  target!"  These  cases  can 
be  handled  by  assigning  artificially  high  values  to  (when  the  prior 
information  so  justifies),  thus  effectively  equating  detection  as  given 
by  to  recognition.  This  point  is  discussed  further  in  Section  III. 

Our  model  of  covers  the  c.  _e  common  intermediate  cases  in  which 
shape  provides  the  primary  criterion  for  recognition. 

THE  NOISE  TERM 

The  last  term  of  our  model,  n»  describes  the  ability  of  an  ob¬ 
server  to  Integrate  out  those  unwanted  fluctuations  usually  referred 
to  as  noise.  More  accurately,  it  describes  the  difficulty  of  reading 
through  any  noise  that  may  be  present  in  the  image  being  viewed.  This 
includes  both  equipment-generated  noise  and  real  but  unpredictable 
fluctuations  in  the  scene  itself.  Amplifier  noise,  TV  beam  effects, 
and  photographic  grain  are  examples  of  the  former;  amplified  photon 
noise  and  the  graininess  of  coherent  imagery  (laser  or  synthetic-aper¬ 
ture  radar)  are  examples  of  the  latter.  True  photon  noise  is  not  per¬ 
tinent  at  this  point,  since  the  model  in  its  present  form  applies  only 
to  photopic  vision — observing  daylight  scenes  or  bright  displays — which 

is  apparently  processor-limited  and  thus  sensitive  to  contrast  rather 

/  26'i 

than  being  noise-limited. 

Image  noise,  whatever  its  source,  affects  the  recognition  processes 
in  many  ways.  First,  it  Increases  the  apparent  congestion  of  a  scene, 

G,  and  thus  reduces  P^.  Second,  it  increases  the  threshold  contrast, 

CT,  required  for  spot  detection,  and  so  reduces  P^.  Third,  by  distorting 
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contrast  boundaries  and  generally  lowering  gradients  or  acutance,  it 
increases  the  required  value  of  (essentially,  the  denominator  in 
the  exponent  of  Eq.  (5))  and  so  reduces  as  well.  Rather  than  speci¬ 
fying  each  of  these  effects  in  detail,  we  take  the  expedient  course  of 
proposing  a  single  overall  degradation  factor,  n.  Whenever  image  noise 
exists,  this  factor  is  to  be  applied  to  the  recognition  probabilities 
estimated  by  the  product. 

Most  of  the  work,  both  analytical  and  experimental,  on  the  effects 
of  noise  on  image  interpretation  is  concerned  only  with  threshold  con¬ 
ditions  for  which  the  probability  of  detection  is  0.5.  Data  on  the 
effect  of  other  than  threshold  values  of  noise  on  the  probability  of 
detection  are  not  easy  to  come  by,  but  there  are  a  few.  Coltman  and 
Anderson  show  that  the  S/N  per  unit  area  that  is  tolerable  for  de¬ 
tection  of  an  image  is  Inversely  proportional  to  the  linear  dimension 
of  the  image.  Since  total  S/N  can  thus  be  traded  directly  for  image 
size,  one  can  conclude  that  the  dependence  of  detection  probability 
on  S/N  should  have  the  same  form  as  that  of  image  size,  namely  the 
form  of  Eq.  (5).  However,  in  view  of  the  paucity  of  good  empirical 
data  on  this  point,  the  author  prefers  a  slightly  more  conservative 
formulation  (predicting  lower  probabilities  at  modest  values  of  S/N), 

which  can  be  had  by  reducing  the  exponent  in  the  exponential  from  2  to 

(35) 

1.  In  addition,  a  few  measurements  by  Schade,  as  replotted  by 
Stathaeopoulos  et  al. ,  do  fit  very  well  on  the  resulting  curve.  We 
therefore  adopt  the  form 


In  direct  vision  (at  high  light  levels) ,  with  no  equipment  or  image 
noise  to  be  accounted  for,  this  S/N  is  infinite  and  n  »  1. 
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HI.  DISCUSSION 


Equations  (2)  to  (6),  combined  as  indicated  by  Eq.  (1),  constitute 
the  proposed  model  of  a  human  observer.  The  fundamental  concepts  and 
the  basic  product  formulation  are  explained  at  the  beginning  of  this 
Memorandum.  The  result,  after  analysing  each  of  the  four  terms,  is  an 
expression  for  the  probability  of  recognition  as  a  function  of  several 
observable  quantities — the  apparent  size  and  contrast  of  the  target, 
the  required  search  rate  and  the  false-target  density  in  the  scene,  and 
the  resolution  and  noise  appearing  on  any  intervening  display. 

An  Important  and  useful  property  of  the  model  is  the  separation 
of  variables  that  has  been  achieved.  Each  of  the  terms  is  expressed 
as  a  function  of  a  xather  small  number  of  input  parameters,  and  target 
3i2€  ia  the  only  parameter  which  appears  in  more  than  one  term.  This 
rather  significant  simplification  arises  from  a  careful  consideration 
of  the  consequences  of  the  product  formulation  end  a  detailed  evalua¬ 
tion  of  each  of  the  terms  over  only  the  ranges  of  the  input  variables 
for  which  that  ter*  ie  controlling  or  otherwise  of  interest. 

For  example,  the  model  is  not  applicable  to  a  target  that  is  so 
isolated  or  whose  contrast  is  so  high  (relative  to  the  background  ciut- 
ter)  that  it  can  easily  be  seen  with  peripheral  vision,  since  in  that 
case  th«  search  rate  can  be  very  much  taster  than  postulated  in  £q.  (2) 
and  will  be  very  high.  But  then  ?2  and  n  will  also  be  very  high 
(essentially  unity),  and  the  problem  is  almost  trivial.  The  search 
model  assumes  only  that  target  contrast  is  not  that  high,  so  that  fairly 
systematic  and  fine-grained  search  must  be  carried  out.  In  fact,  the 
actual  search  rate  employed  by  an  observer  is  determined  by  some  sort 
of  average  false-target  density  over  the  scene.  If  the  actual  contrast 
of  a  specific  target  against  its  contiguous  background  turns  out  to  be 
less  than  surf icier, t  for  recognition  to  take  place  during  a  single  prop¬ 
erly  directed  glimpse,  this  fact  will  show  up  in  and  P^,  which  will 
correctly  reduce  the  value  of 

■k 

This  is  essentially  what  is  achieved  by  multispectral  cueing 
or  by  KT1  radar,  as  indicated  on  p,  8. 
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The  relationship  between  the  conditional  probabilities,  and  P^, 
can  be  discussed  in  a  similar  way.  While  it  is  clear  that  they  ~re 
intimately  related,  they  are  separated,  with  further  separation  of  vari¬ 
ables,  for  several  reasons.  Ordinarily,  detection  not  only  precedes 
but  also  dominates  shape  recognition.  That  is,  unless  P^  is  rather 
high,  there  is  probably  no  point  in  even  calculating  P^  or  n,  since  P^ 
will  be  too  low  to  justify  the  sortie.  When  P^  is  high,  then  P^  con¬ 
trols.  On  the  other  hand,  as  has  been  mentioned,  there  are  cases  for 
which  a  priori  or  contextual  information  may  suffice  to  obviate  the 
need  for  shape  recognition  per  sa.  In  such  cases — boats  on.  a  river 
or  trucks  on  a  road,  for  example—P^  can  be  ignored  (i.e.,  set  to  unity 
without  regard  for  Eq.  (5)>  and  P ^  will  control.  By  keeping  the  two 
terms  separate,  model  flexibility  is  preserved.  Further  arguments  for 
this  separation  revolve  around  tha  role  of  resolution.  First,  as  a 
practical  matter,  most  man-made  sensor  systems  are  resolutien-limited, 
since  resolution  always  costs  something.  (This  is  true  at  least  of 
systems  whose  displays  are  properly  designed.)  Accordingly  the  some- 
tiaee-difficult  calculation  of  system  MTF  need  be  applied  only  once 
(namely,  when  it  is  most  critical)  in  the  shape-recognition  term.  More 
importantly,  there  are  many  cases  with  multiscaled  or  zoom-capable 
systems  in  which  the  combination  of  a  priori  information  and  required 
search  area  may  make  a  two-step  identification  of  the  target  desirable. 
In  such  cases  an  Initial  and  tentative  detaction  on  a  wide  field  of 
view  is  followed  and  confirmed  (or  denied)  by  shape  recognition  on  a 
magnified  image.  At  the  first  stap  F.  controls,  but  P_  is  Incomplete; 
at  the  second  step  P^  controls. 

Finally,  n  affects  P^,  P^,  end  P^  as  has  been  mentioned,  but  it 
is  kept  separate  merely  for  convenience.  In  fact,  all  four  terms,  ss 
they  are  defined,  are  not  only  functions  of  different  variables,  but 
are  also  subject  to  different  kinds  of  uncertainties  and  will  require 
different  experiments  for  their  future  refinement.  Yet  the.  product 
of  the  four  provides  a  viable  model  for  a  wide  variety  of  circumstances; 
it  can  be  used  in  predicting  the  capabilities  of  a  broad  class  of  manned 
systems,  since  it  deals  only  with  the  observer  and  the  information  pre¬ 
sented  to  him,  whether  this  be  directly  to  his  unaided  eyes  or  through 
optical  aids  or  sophisticated  artificial  sensors. 
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It  has  been  emphasized,  nevertheless,  that  the  applicability  of 
this  model  is  restricted  to  structured  search  (as  in  air-to-ground  ap¬ 
plications)  for  fixed  objects  whose  appearance  is  at  least  approximately 
known  (as  in  acquiring  pre-briefed  targets)  under  conditions  of  time- 
urgency  (e.g.,  prior  to  weapon  delivery).  In  quite  different  contexts, 
such  as  monitoring  static  situations,  unstructured  search  (as  in  look¬ 
ing  for  aircraft  against  a  completely  homogeneous  sky  background),  and 
examinatien  of  unfamiliar  imagery  by  photo- interpreters,  this  model 
will  be  quite  inadequate.  Also,  the  accuracy  of  these  predictions,  or 
the  lack  thereof,  should  be  kept  firmly  in  mind.  As  judged  from  the 
degree  of  consistency  of  the  available  experimental  data,  it  has  been 
indicated  that  most  of  the  terms  of  the  model  are  correct  to  within 
some  20  to  30  percent  (1  o,  measured  at  the  inputs — contrast,  number 
of  resolution  cells,  etc.)  and  that  the  search  rates  may  well  be  in 
error  by  a  factor  of  two  or  so  in  either  direction.  Hence  the  real 
utility  of  the  model  is  in  setting  bounds  on  what  should  be  expected 
of  observers  in  "real-time"  situations. 

No  overall  "validation"  cf  this  model,  in  the  sense  of  completely 

controlled  field  tests,  is  known  to  exist.  Of  course,  the  several 

pieces  of  the  model  are  based  on  experimental  evidence,  including  such 

flight  tests  as  are  pertinent,  but  better  operational  data  are  badly 

needed.  Field  trials,  carefully  designed  with  some  sort  of  predicting 

model  in  mind,  and  with  all  the  pertinent  parameters  recorded,  are  a 

necessity.  If  such  programs  could  be  funded,  it  could  be  hoped  that 

eventually  there  might  emerge  a  quantitative  understanding  of  observer 

(7.6) 

performance  along  the  lines  of  Ory’s'  treatment  of  threshold  visual 
performance.  At  present,  however,  this  appears  to  be  no  more  than  a 
distant  gleam. 

The  difficulties  encountered  in  attempting  to  predict  recognition 
probabilities  are  manifest  and  well  known.  Nevertheless,  this  simpli¬ 
fied  model  of  the  observer,  when  properly  combined  with  data  on  targets, 
backgrounds,  the  atmosphere,  and  the  performance  of  specific  sensors,  is 
believed  to  be  capable  of  setting  bounds  on  feasibility  that  are  prac¬ 
tically  useful.  When  applied  reiteratively  to  successive  system  designs, 
the  model  serves  to  define — albeit  loosely  at  present — the  requirements 
placed  on  any  system  which  is  to  be  operated  by  a  human  observer. 
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